Skip to content

Conversation

@bosconi
Copy link
Member

@bosconi bosconi commented Jan 14, 2026

Summary

  • Add timing instrumentation around future polls in AsyncOperatorBuilder
  • Log warning when a poll exceeds 10ms threshold
  • Include operator address and global_id in warnings for identification

Motivation

Async operators running in the timely context can block the worker thread if they do significant synchronous work before hitting an await point. This can prevent heartbeat tasks from running and cause persist reader lease expirations.

The proper fix (swapping to channel-based communication with tokio tasks) is acknowledged as complex in the existing TODO. This PR adds visibility into the problem while that architectural work is planned.

Test plan

  • cargo check -p mz-timely-util passes
  • cargo test -p mz-timely-util passes (18/18 tests)
  • cargo test -p mz-persist-client --lib passes (121/121 tests)

Generated with Claude Code

Async operators running in the timely context can block the worker
thread if they do significant synchronous work before hitting an
await point. This can prevent heartbeat tasks from running and
cause persist reader lease expirations.

This change adds instrumentation to detect slow polls:
- Track the duration of each future poll
- Log a warning when a poll exceeds 10ms threshold
- Include operator address and global_id in the warning

This provides visibility into problematic operators while the more
invasive architectural fix (channel-based communication with tokio
tasks) is planned for future work.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@bosconi bosconi requested a review from a team as a code owner January 14, 2026 12:18
Copy link
Member

@antiguru antiguru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right approach. We have introspection data that contains the information we need to diagnose which operators run for how long, and we should use this instead of inventing a new mechanism. (With the caveat that the introspection doesn't apply to sources/sinks, but that's a separate problem we should fix.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants